Spatial analysis of the occurrence of the western conifer seed bug Leptoglossus occidentalis (Heteroptera: Coreidae) in Europe based on multiple environmental variables

Abstract The western conifer seed bug (WCSB) Leptoglossus occidentalis (Heidemann) (Heteroptera: Coreidae) is a pest insect that causes significant losses of coniferous trees worldwide. In this study, we sought to project the potential distribution of the WCSB based on dual CLIMEX modeling and random forest (RF) analysis to obtain basic data for WCSB monitoring strategies. The CLIMEX model, a semimechanistic niche model that responds to climate‐based environmental parameters, is a species distribution model that focuses on regional climatic suitability. Given that this model can be used to select areas that are likely to reflect the climatically favorable spread of species, which we initially used CLIMEX to evaluate the potential distribution of the WCSB. The RF algorithm was used to predict the potential occurrence of WCSB and to evaluate the relative importance of environmental variables for WCSB occurrence. Using the RF model, land cover was found to be the most important variable for classifying the presence/pseudo‐absence of the WCSB, with an accuracy of 77.1%. Climatic suitability for the WCSB was predicted to be 2.4‐fold higher in Southern Europe than in Western Europe, and the WCSB was predicted to occur primarily near coniferous forests. Given that CLIMEX and RF analyses yielded different prediction results, using the findings of both models may compensate for the shortcomings of these models when used independently. Consequently, to ensure greater prediction reliability, we believe that it would be beneficial to base predictions on the combined potential distribution data obtained using both modeling approaches.


| INTRODUC TI ON
The western conifer seed bug (WCSB) Leptoglossus occidentalis (Heidemann) (Heteroptera: Coreidae) is an insect pest native to North America that causes severe damage to the seeds of coniferous trees (Fent & Kment, 2011;Koerber, 1963;McPherson et al., 1990;Tylor et al., 2001). Since that time, the distribution range of this species has expanded rapidly throughout eastern North America, Europe, and East Asia (Ahn et al., 2013;Lesieur et al., 2019;McPherson et al., 1990;Villa et al., 2001), and in Europe in particular, has been a source of notable losses in the agricultural and forestry sectors.
Recently, economic losses of forest resources and adverse effects on forest ecosystems have been reported in regions, including South America and northern Africa (Gapon, 2013;Lesieur et al., 2019;Olivera et al., 2020).
The WCSB uses coniferous trees as a shelter and food source necessary for maintaining its lifecycle even under a harsh environment (Fent & Kment, 2011). Climate affects the development and survival rates of the WCSB, meaning it is a determinant for establishing the pest population in a new area (Barta, 2016;Byeon et al., 2021;Park et al., 2016;Vincenzi et al., 2011). Thus, host plants and climate have been utilized the important factor in evaluating the establishment of WCSB (Byeon et al., 2021;Fent & Kment, 2011).
However, only a few currently employed approaches can be applied to provide fundamental information regarding the regional potential occurrence and distribution of pest species (Desneux et al., 2022;Raffini et al., 2020). Among these, species distribution modeling has been used to obtain ecological and evolutionary insights by predicting the potential distribution of a species as a function of environmental variables (Bosso et al., 2022;Elith & Leathwick, 2009;Li et al., 2022). There are two main types of species distribution modeling. Correlative models use statistical and machine learning approach to define realized niche by relating species occurrence with environmental variables (Evans et al., 2011). Mechanistic models evaluate fundamental niche based on physiological processes coded by parameters-data interaction (Evans et al., 2011;Kearney et al., 2010). Correlative models can be developed with occurrence data only, and define conservative potential distribution through a relatively simple process (Valavi et al., 2021). By contrast, mechanistic models require species biology for parameter estimation, and repetitive adjustment for model simulation, which is relatively complex compared with the correlative models. However, the mechanistic models are able to provide biologically endurable areas of species distribution besides regions having similar environments and mechanism of species occurrence (Byeon et al., 2021;Kearney et al., 2010;Kriticos et al., 2015). Consequently, it is necessary to select an adequate type of species distribution model by considering available data, target species, and purpose of a study (Lee et al., 2022).
In this study, we aimed to achieve reliable predictions using a multiple modeling approach with different algorithms, which has not been applied to the WCSB. CLIMEX software was utilized to validate the projections for the climate-species correlation by directly incorporating the biological data of WCSB, whereas random forest (RF) was employed to evaluate possible habitats based on the present range of the WCSB. Hence, our primary objective was to evaluate the potential distribution of the WCSB in Europe by using the above two algorithms as a function of environmental variables and to identify the variables confining WCSB distribution. Furthermore, we compared the potential distribution projected by the occurrence coordinates and developed by using biological data to demonstrate the effect of environmental factors other than climate on the occurrence of WCSB.

| The CLIMEX model
In this study, we used CLIMEX (version 4.0; Hearne Software) to evaluate the climate-based potential distribution of the WCSB in Europe. The use of CLIMEX requires the estimation of parameters based on the biological characteristics of a target species and climate data pertaining to the region for which species distribution is to be predicted (Kriticos et al., 2015). There are two primary indices in the CLIMEX model; Growth Index (GI) and Stress Index (SI) for growth and inhibition potential under the given climatic condition, respectively. The GI is calculated by multiplying various sub-indices, such as Temperature Index (TI), and Moisture Index (MI), while SI is comprised of Cold Stress (CS), Heat Stress (HS), Dry Stress (DS), and Wet Stress (WS). The TI requires four parameters, which are lower temperature threshold (DV0), lower optimal temperature (DV1), upper optimal temperature (DV2), and upper temperature threshold (DV3).
In this study, we estimated parameter values from DV0 to DV3 and population degree-days (PDD) (minimum degree-days above DV0 necessary to complete one generation) based on lower developmental temperatures and accumulative temperature (degree-days) of WCSB reported in a previous study (Barta, 2016). By contrast, other parameters were set to have the best fit to the actual distribution of WCSB in North America because there was no available data for estimating those parameters. Given that most of the distribution coordinates for WCSB are located in North America and Europe, the parameters were calibrated for North America, and the set parameters were applied to Europe. However, using PDD, we were unable to project a potential distribution that adequately covered the actual distribution of WCSB. Consequently, we performed CLIMEX modeling both with and without consideration of PDD. The approach for setting up the parameter values has been described elsewhere (Jung et al., 2016;Kriticos et al., 2015), and all parameter values used are listed in Table 1.
On the basis of the estimated parameter values and regional climate data (described below), CLIMEX calculates the Ecoclimatic Index (EI), which is a quantitative measure of the climatic suitability of a specific area for a given target species by multiplying GI and SI.

| Distribution of the western conifer seed bug in Europe
First observed in northern Italy in 1999, the WCSB is now distributed in approximately 30 European countries (Barta, 2016;CABI, 2019). Recently, large numbers of WCSB occurrences have been reported in the United Kingdom (UK), Belgium, and the Netherlands (GBIF.org, 2020), and there has been a continuous expansion in the species' northern limit of distribution, with reports of these bugs in several Northern European countries, including Sweden and Norway (Mjøs et al., 2010). The occurrence records of WCSB were obtained from GBIF and crosschecked with other sources including the Centre for Agriculture and Bioscience International (CABI), and previous studies to minimize the uncertainty and bias caused by human accessibility (Barta, 2016;CABI, 2019;Mjøs et al., 2010). A total of 15,241 coordinates recorded from 1985 to 2020 in Europe were collected, while 1982 coordinates were additionally obtained to fit the parameters of CLIMEX model in North America (GBIF.org, 2020; Figure 1). Given that uneven sampling needs to be adjusted to improve the reliability of model results and a sufficient number of occurrence coordinates were confirmed, the obtained data were spatially filtered with 10 km radius buffer to additionally reduce the sampling bias as well as to avoid spatial autocorrelation (Beck et al., 2014;Boria et al., 2014;Brown et al., 2017;Kramer-Schadt et al., 2013). As a consequence, we obtained a total of 1939 data points and used it as a presence data (GBIF data). With respect to pseudo-absence data, we used ArcGIS (version 10.4.1; ESRI) to extract 1939 points from grid cell at which WCSB was not recorded using undersampling in all Europe. We set pseudo-absence points equal to the number of presence points to avoid biased learning caused by one TA B L E 1 Environmental parameter values associated with the occurrence of the western conifer seed bugs (Leptoglossus occidentalis) used in the CLIMEX model.

| Constructing the RF algorithm
The RF algorithm is an ensemble algorithm used to perform regression or classification analysis based on the construction of a multitude of decision trees (Cutler et al., 2007). Compared with other machine learning algorithms, the use of the RF algorithm facilitates a more accurate classification of species distribution (Park et al., 2016). When using the RF algorithm, classification or regression is trained based on a subset of the total data, and an out-of-bag (OOB) element is calculated to estimate the accuracy and error rate (Cutler et al., 2007;Garzon et al., 2006;Vincenzi et al., 2011). The number of trees (ntree) was varied from 50 to 400 with increments of 50, while variables at each split (mtry) were set to 1, 2, 3, or 4. Then, we evaluated all 32 combinations of ntree and mtry, and the optimal structure was determined by the values minimizing the OBB error rate with no signs of over-fitting. For the purposes of the present study, we randomly divided the distribution data coding for the presence and pseudoabsence of the WCSB in Europe into training (2716) and test (1162) data at a ratio of 7:3. In addition, both training and test data contain the same ratio of the presence and pseudo-absence of the WCSB at 5:5. The values of each of the eight variables located in the presence or pseudo-absence points were extracted using ArcGIS. The ntree parameter was set to 100 trees, as this was found to minimize the OOB error rate, and mtry was set to 3, corresponding to the square root of the number of variables. mtry generally uses one-third or the square root of the number of variables for classification (Cutler et al., 2007).
Thereafter, we visualized the results of classification (presence and pseudo-absence coded as 1 and 0, respectively) on a map of Europe.
To assess the performance of the model based on sensitivity and specificity, we used a confusion matrix recording true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs).
Subsequently, a receiver-operating characteristic (ROC) curve was plotted to depict the relationship between sensitivity and specificity (sensitivity vs. 1 − specificity) on a two-dimensional plane. True Skill Statistic (TSS) was calculated by sensitivity + specificity − 1, and the area under the ROC curve (AUC) was measured (Allouche et al., 2006).
In addition, we also examined the relative importance of the variables used in model, according to the degree of improvement in model accuracy attributable to these variables and their contribution to the classification impurity by means of mean decrease accuracy (MDA) and mean decrease in Gini (MDG), respectively (Cutler et al., 2007). The entire model construction process is illustrated in Figure 2.
F I G U R E 1 Distribution of western conifer seed bug in Europe.

| Environmental factors required for model operation
The survival rate, growth rate, and reproduction of WCSB depend on climate factors such as temperature and precipitation (Barta, 2016;Lesieur et al., 2019). In addition, the distribution of  (Barta, 2016).
By contrast, the minimum temperature was determined by selecting August and December, which mark the beginning of the second generation and entry into winter for WCSB (Barta, 2016). To ensure the seasonality, variables of precipitation were chosen for each season.

| Potential geographical distribution of the western conifer seed bug in response to climate
We found that the potential distribution of the WCSB predicted using climate data based on CLIMEX modeling did not fit the actual range of WCSB distribution (Figure 4a). In particular, according to the model, the effective accumulated temperature required for the completion of a single WCSB generation was insufficient in the UK, Belgium, and the Netherlands. Even though we used the lowest value reported by Barta (2016) as the lower developmental threshold (LDT) and effective accumulated temperature, we were unable F I G U R E 3 Potential distribution of the western conifer seed bug (WCSB: Leptoglossus occidentalis) under the current climate predicted based on the ecoclimatic index (EI) from the CLIMEX model, excluding population degree-days (PDD) for the United States of America.
to simulate the potential distribution of WCSB, such that it fitted the actual distribution. Agreement between the projection and the actual distribution could, nevertheless, be achieved by reducing the 13.4°C LDT reported by Barta (2016) by 3°C. However, the potential distribution without the inclusion of PDD could project more than 99% of the occurrence records ( Figure 4b). In addition, the CLIMEX  Figure 4a). Furthermore, a latitude of approximately 54° N was predicted to be the northern limit of WCSB distribution, which implies that if the effective accumulative temperature is not satisfied, these bugs should not be able to survive in Denmark, Sweden, or Norway.

| Analysis of occurrence (presence and pseudoabsence) maps and identification of important variables affecting western conifer seed bug distribution using the RF algorithm
Random forest modeling using test data predicted a higher WCSB occurrence in Northern Europe (the southern part of England, Belgium, and the Netherlands) than in Southern Europe, with 77.1% accuracy and AUC of 0.82 for the presence of WCSB ( Figure 5). The TSS was 0.54, which was relatively low, but acceptable (Tobeña et al., 2016). In addition, the reliability of the RF model was evaluated using the OOB error rate for classifying either the presence or pseudo-absence of the WCSB, which indicated a reliability of 21%. in areas with an altitude of less than 50 m a.s.l. For example, the low-lying areas in northern Belgium had more records of WCSB occurrence than the higher altitude areas in southern Belgium.
Comparatively, in South Korea, wherein WCSB generally occurs at altitudes of less than 200 m, altitude has been found to explain approximately 18% of WCSB occurrence (Kim et al., 2020).

| DISCUSS ION
In this study, we assessed the utility of two modeling approaches in predicting the potential distribution and occurrence of the WCSB in Europe. CLIMEX models predict the potential distribution of a target species based on an area's climatic suitability, given the assumption that climate is a dominant factor determining habitat suitability for a specific species (Kriticos et al., 2015). The results obtained using CLIMEX models can be useful with respect to determining areas where a species can initially be introduced, and thus we used a CLIMEX model to predict the suitability of European climates for the establishment of WCSB. The prediction results thus obtained indicated that most areas of Northern Europe would be climatically unsuitable for WCSB, owing to an insufficient number of PDD, which represents the effective accumulative temperature (Figure 4b). The lowest threshold temperature and effective accumulative temperature for WCSB have been reported to be approximately 14°C and 533 DD, respectively (Barta, 2016). In addition, Bernardinelli et al. (2006) have reported that the eggs of these bugs require an approximately average of 100 DD above 13.5°C to hatch, whereas 370 DD above 14.6°C is necessary to support development from firstinstar nymphs to the adult stage. In Northern Europe, an average of 300-400 DD accumulates annually at a threshold of approximately 14°C, thereby indicating that an additional 100-200 DD would be necessary to support the completion of a single WCSB generation at these northern latitudes. Although there have been reports of the WCSB in Sweden and Norway, our results would tend to indicate the unlikelihood of a long-term establishment of this species at these northern latitudes (Lindelöw & Bergsten, 2012;Mjøs et al., 2010).
Whereas the southern part of the UK is considered to have adequate climatic conditions for the WCSB, a lack of sufficient degree-days was constantly indicated to limit the occurrence of at least one generation of the WCSB in northern areas of the UK. These results conceivably reflect the fact that CLIMEX modeling fails to take microclimatic conditions into consideration, or insufficiently reflects the characteristics of wild species in the assessed parameters. With respect to the former of these two possibilities, CLIMEX modeling is based on the use of general atmospheric temperatures at a resolution of 0.1 decimal degrees, which would not reflect environmental conditions at the micro-habitat scale. Accordingly, the predictions obtained using this model would probably be unlikely to provide a sufficiently accurate indication regarding the distribution of the first-generation WCSB. In the case of the latter explanation, studies on WCSB development reported to date have generally been performed in a laboratory setting rather than in the field. Thus, the data obtained, which are used for setting model parameters, are probably not totally representative of the characteristics of this species in its natural habitats (Johnson et al., 1995).
However, the contrasting CLIMEX projections of WCSB occurrence in Europe, obtained with and without the inclusion of PDD, indicate the feasibility of successful WCSB establishment in Northern Europe, as a consequence of phenotypic variability and adaptation to local climatic conditions (Lombardo & Elkinton, 2017). In this context, it has been reported that the adult size and development time F I G U R E 5 Predicted probability map for occurrence of the western conifer seed bug (WCSB: Leptoglossus occidentalis) using the random forest (RF) algorithm.

F I G U R E 6
The importance of variables in classifying the presence and pseudo-absence of the western conifer seed bug (WCSB: Leptoglossus occidentalis) by (a) mean decrease in accuracy (MDA) and (b) mean decrease in Gini (MDG). Monthly maximum temperature in March (tx3) and May (tx5); minimum temperature in August (tn8) and December (tn12); total precipitation in April (rr4), May (rr5), August (rr8), September (rr9), and October (rr10); radiation in May (qq5) and December (qq12); relative humidity in April (hu4) and December (hu12); digital elevation model (DEM); and distance from coniferous forest (Di).
of the mountain pine beetle Dendroctonus ponderosae (Hopkins) (Coleoptera: Curculionidae) are influenced by brood hosts rather than geographical origin, which is reflected in a more rapid rate of development in northern populations than in those distributed in southern areas (Bentz et al., 2001). Similarly, differences have been identified in the genetic variability of the red turpentine beetle Dendroctonus valens (LeConte) (Coleoptera: Curculionidae), depending on national or geographical location (Taerum et al., 2016).
Accordingly, we believe that modeling the potential distribution of the WCSB without taking into consideration PDD would be a more suitable approach, given that it includes most of the occurrence coordinates that are likely to reflect phenotypic variability and climatic adaptation (Figure 4b). Alternatively, it is conceivable that the establishment of the WCSB in Northern Europe has been facilitated by features of the urban environment, such as buildings, houses, gardens, sheds, and wooden shutters, which enable these bugs overcome unfavorable climates and accumulate a sufficient number of degree-days to complete at least one generation (Barta, 2016;Bernardinelli & Zandigiacomo, 2001;Blatt, 1994;Fent & Kment, 2011;Gall, 1992;Koerber, 1963;Lesieur et al., 2019;Tamburini et al., 2012;Wheeler, 1992).
In order to take appropriate preventive measures to limit the severe damage caused by the WCSB, it is initially necessary to identify the potential distribution and occurrence patterns of this species.
However, despite the advantages of species distribution modeling and the RF algorithm with respect to monitoring the potential distribution range of the WCSB, these approaches have not been widely used to date. Accordingly, widely contrasting results can be obtained when using these two models, which is well illustrated by the poten- Although high population densities of WCSB recorded in a given area can be assumed to indicate the environmental suitability of the area, it could also be a consequence to uneven sampling, reflecting the likelihood that these bugs will be observed by humans, and thereby indicates the risk of incorrect projection or classification associated with the acquisition of nonsystematic or unverified-presence data (Warton et al., 2013;Zaniewski et al., 2002). In the present study, 96.4% of WCSB occurrence records were human observations, which would tend to indicate a sampling bias toward residential areas where the WCSB might be more readily encountered (GBIF.org, 2020).
However, despite the drawbacks of biased sampling, presence data are still useful in developing a niche model, in that these are the only type of data that incorporate the environmental characteristics of occurrence areas, whereas pseudo-absence data are too ambiguous to be usefully defined (Ottaviani et al., 2004). Consequently, it is essential to minimize sampling bias when developing species distribution models that are generally dependent on the presence of data (Boria et al., 2014). Confirmation that the occurrence data obtained are those with the least uncertainty, is generally considered sufficient to gain an insight into the distribution of a species, particularly when such modeling is combined with environmental variables (Bellamy et al., 2013;Lobo, 2016;Monk et al., 2010;Pearce & Boyce, 2006;Saito et al., 2012;Zaniewski et al., 2002).
Parameter estimation is a critical step in CLIMEX modeling because parameters that code species biology confine potential distributions of the species by evaluating species responses to climate.
Even though the current study attempted to collect sufficient biological data on the WCSB from available sources, it was unable to fully capture the biological characteristics of the entire European population. Furthermore, the biological characteristics of WCSB may vary across populations, necessitating a way to consider other factors producing this discrepancy, such as phylogenetic information collected from diverse regions (Ikeda et al., 2012). An ensemble model that combines CLIMEX and RF models can be possible with a foundation provided in this study, which allows us improving model reliability by reducing the uncertainty of individual models through a combination of different models (Kumar et al., 2015;Narouei-Khandan et al., 2020).

| CON CLUS ION
In this study, we determined the potential distribution of the western conifer seed bug L. occidentalis and obtained predictive presence/pseudo-absence maps by identifying those environmental variables deemed to be of importance in determining WCSB distribution. We found that in most areas where the occurrence of the WCSB has been reported, the number of degree-days is probably insufficient for the WCSB to complete a single generation. These findings would accordingly tend to imply that in addition to climate, phenotypic variability, adaptation to local climatic conditions, and environmental factors may, to differing extents, contribute to determining the establishment of WCSB in Northern Europe. It should, however, be noted that records of occurrence are dependent on human observations, and is thus highly likely to reflect uneven sampling. Consequently, to minimize sampling bias, it is necessary to spatially process these records. The variables established to influence WCSB distribution, and the predictive presence/pseudo-absence maps thus obtained, can be used to screen European regions to identify those areas considered suitable for the expansion of WCSB, and thereby serve as management tools in efforts to minimize the damage caused to coniferous forests by this pest species. funding acquisition (lead); investigation (supporting); methodology (supporting); project administration (lead); supervision (lead); writing -original draft (supporting); writing -review and editing (lead).

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Datasets used in this study are available online in Dryad from https://doi.org/10.5061/dryad.dv41n s22r.